AITopics | value function estimate

83e8fe6279ad25f15b23c6298c6a3584-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 15:15:14 GMT

max ln, probability, state-action pair, (13 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Belief-DependentMacro-ActionDiscovery inPOMDPsusingtheValueofInformation

Neural Information Processing SystemsFeb-9-2026, 03:22:31 GMT

This property can be observed directly from Eq. 2 when integration is replaced by summation. Closed-loop First, we construct the standard, closed-loopα-vectors, which represent the value function under closed loop dynamics [1,5]. Each point in the scatter plot represents a paired experiment with identical target dynamics.

artificial intelligence, bolph, es ba1, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.31)

Add feedback

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

Neural Information Processing SystemsNov-15-2025, 05:32:43 GMT

assumption 2, lyapunov function, opponent, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.93)

Add feedback

Decentralized Q-Learning in Zero-sum Markov Games

Neural Information Processing SystemsNov-15-2025, 05:32:39 GMT

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games.

algorithm, assumption 2, markov game, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)

Genre: Overview (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

7f2be1b45d278ac18804b79207a24c53-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 08:57:29 GMT

artificial intelligence, belief revision, value function, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.31)

Add feedback

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 06:13:43 GMT

artificial intelligence, assumption 2, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.93)

Add feedback

985e9a46e10005356bbaf194249f6856-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 06:13:39 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Japan > Honshū > Chūgoku > Hiroshima Prefecture > Hiroshima (0.04)

Genre: Overview (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.67)

Add feedback

83e8fe6279ad25f15b23c6298c6a3584-Supplemental.pdf

Neural Information Processing SystemsAug-15-2025, 14:07:11 GMT

observe-then-plan, probability, state-action pair, (13 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning

Baheri, Ali, Sharooei, Zahra, Salgarkar, Chirayu

arXiv.org Machine LearningJan-17-2025

We present Wasserstein Adaptive Value Estimation for Actor-Critic (WAVE), an approach to enhance stability in deep reinforcement learning through adaptive Wasserstein regularization. Our method addresses the inherent instability of actor-critic algorithms by incorporating an adaptively weighted Wasserstein regularization term into the critic's loss function. We prove that WAVE achieves $\mathcal{O}\left(\frac{1}{k}\right)$ convergence rate for the critic's mean squared error and provide theoretical guarantees for stability through Wasserstein-based regularization. Using the Sinkhorn approximation for computational efficiency, our approach automatically adjusts the regularization based on the agent's performance. Theoretical analysis and experimental results demonstrate that WAVE achieves superior performance compared to standard actor-critic methods.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2501.10605

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Simulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with Kullback-Leibler Control Cost

Nakhleh, Khaled, Eksin, Ceyhun, Ekin, Sabit

arXiv.org Artificial IntelligenceOct-19-2024

This paper proposes an agent-based optimistic policy iteration (OPI) scheme for learning stationary optimal stochastic policies in multi-agent Markov Decision Processes (MDPs), in which agents incur a Kullback-Leibler (KL) divergence cost for their control efforts and an additional cost for the joint state. The proposed scheme consists of a greedy policy improvement step followed by an m-step temporal difference (TD) policy evaluation step. We use the separable structure of the instantaneous cost to show that the policy improvement step follows a Boltzmann distribution that depends on the current value function estimate and the uncontrolled transition probabilities. This allows agents to compute the improved joint policy independently. We show that both the synchronous (entire state space evaluation) and asynchronous (a uniformly sampled set of substates) versions of the OPI scheme with finite policy evaluation rollout converge to the optimal value function and an optimal joint policy asymptotically.

artificial intelligence, iteration, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2410.15156

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Collaborating Authors

value function estimate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

83e8fe6279ad25f15b23c6298c6a3584-Supplemental.pdf

Belief-DependentMacro-ActionDiscovery inPOMDPsusingtheValueofInformation

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

Decentralized Q-Learning in Zero-sum Markov Games

7f2be1b45d278ac18804b79207a24c53-Supplemental.pdf

985e9a46e10005356bbaf194249f6856-Supplemental.pdf

985e9a46e10005356bbaf194249f6856-Paper.pdf

83e8fe6279ad25f15b23c6298c6a3584-Supplemental.pdf

Wasserstein Adaptive Value Estimation for Actor-Critic Reinforcement Learning

Simulation-Based Optimistic Policy Iteration For Multi-Agent MDPs with Kullback-Leibler Control Cost